Diagnosing and Improving Topic Models by Analyzing Posterior Variability
نویسندگان
چکیده
Bayesian inference methods for probabilistic topic models can quantify uncertainty in the parameters, which has primarily been used to increase the robustness of parameter estimates. In this work, we explore other rich information that can be obtained by analyzing the posterior distributions in topic models. Experimenting with latent Dirichlet allocation on two datasets, we propose ideas incorporating information about the posterior distributions at the topic level and at the word level. At the topic level, we propose a metric called topic stability that measures the variability of the topic parameters under the posterior. We show that this metric is correlated with human judgments of topic quality as well as with the consistency of topics appearing across multiple models. At the word level, we experiment with different methods for adjusting individual word probabilities within topics based on their uncertainty. Humans prefer words ranked by our adjusted estimates nearly twice as often when compared to the traditional approach. Finally, we describe how the ideas presented in this work could potentially applied to other predictive or exploratory models in future work.
منابع مشابه
Bayesian Checking for Topic Models
Real document collections do not fit the independence assumptions asserted by most statistical topic models, but how badly do they violate them? We present a Bayesian method for measuring how well a topic model fits a corpus. Our approach is based on posterior predictive checking, a method for diagnosing Bayesian models in user-defined ways. Our method can identify where a topic model fits the ...
متن کاملHierarchical Bayesian Modeling of Human Decision-Making Using Wiener Diffusion
Wiener diffusion accounts of human decision-making are among the most successful and best developed formal models in the psychological sciences. We reconsider these models from a Bayesian perspective, using graphical modeling, and Markov Chain Monte-Carlo methods for posterior sampling. By analyzing seminal data from a brightness discrimination task, we show how the Bayesian approach offers sev...
متن کاملEstimating Tumor/Non-Tumor Uptake from Radiolabeled Monoclonal Antibodies using Scintigraphic Images and Dissecting the Animal Models
Introduction: Biodistribution study in animal models bearing tumors is one of the most important procedures in evaluation of fractional uptake of radiopharmaceuticals in the tumor and non-tumor organs. The aim of this study was to develop a new software-based method to determine activities that accumulate in the main organs as well as the tumor based on scintigraphy images, thereby obviating th...
متن کاملIMAGE SEGMENTATION USING GAUSSIAN MIXTURE MODEL
Stochastic models such as mixture models, graphical models, Markov random fields and hidden Markov models have key role in probabilistic data analysis. In this paper, we have learned Gaussian mixture model to the pixels of an image. The parameters of the model have estimated by EM-algorithm. In addition pixel labeling corresponded to each pixel of true image is made by Bayes rule. In fact, ...
متن کاملImage Segmentation using Gaussian Mixture Model
Abstract: Stochastic models such as mixture models, graphical models, Markov random fields and hidden Markov models have key role in probabilistic data analysis. In this paper, we used Gaussian mixture model to the pixels of an image. The parameters of the model were estimated by EM-algorithm. In addition pixel labeling corresponded to each pixel of true image was made by Bayes rule. In fact,...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017